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Final  Progress  Report 
Executive  Summary 

A  major  challenge  in  combustion  chemistry  is  to  be  sure  that  all  of  the  important  reactions  have  been  included 
in  the  kinetic  model.  Experience  shows  that  missing  reactions  are  a  major  cause  of  discrepancies  between 
model  predictions  and  experimental  performance.  Looking  at  combustion  chemistry  as  motion  on  a  Potential 
Energy  Surface  landscape,  reactions  are  the  passes  that  connect  valleys  representing  the  different  molecules; 
whether  or  not  two  valleys  are  connected  is  the  key  issue  determining  the  topology  of  the  landscape. 
Conventional  kinetic  modeling  approaches  include  only  "well  known"  types  of  reactions  in  the  models;  as 
demonstrated  in  this  project  these  conventional  models  omit  most  of  the  actual  reactions  occurring  in  the 
engine.  Discovering  new  reactions  is  very  challenging,  so  historically  many  of  the  most  important  reactions 
have  been  discovered  by  serendipity  (either  experimentally  or  by  using  quantum  chemistry).  In  this  project 
we  develop  a  much  more  systematic  approach,  using  high  performance  computing  to  search  for  reactions 
connecting  a  specified  reactant  to  each  possible  product,  and  using  quantum  chemistry  to  determine  the 
reaction  barriers.  The  algorithm  developed  here  has  4  key  steps:  (1)  identification  of  all  the  potential  product 
channels;  (2)  Search  for  a  low  energy  reaction  path  connecting  reactant  and  each  product  channel;  and  (3) 
refinement  of  the  reaction  path  to  determine  the  saddle  point  geometry  and  barrier  height.  This  new 
computational  approach  is  found  to  be  the  most  effective  method  so  far  at  discovering  novel  chemical 
reactions.  Using  the  new  algorithm  we  have  discovered  38  new  exothermic  or  mildly  endothermic  reactions 
in  this  project;  about  half  of  these  newly-discovered  reactions  were  completely  surprising.  Possible  future 
research  directions  to  make  this  promising  algorithm  even  more  reliable,  computationally  efficient,  and  useful 
for  engine  modeling  purposes  are  discussed  briefly. 
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Introduction 


Currently,  new  chemical  understanding  is  driven  mainly  by  the  experimental  discovery  of  new  compounds 
and  their  reactivity.  This  experimental  work  is  expensive  and  time-consuming,  but  the  payoff  for  discovering 
new  chemical  reactions  is  enormous  in  terms  of  both  fundamental  understanding  and  practical  engineering. 
Examples  of  important  new  reactions  discovered  in  recent  decades  include  many  advances  that  followed  from 
the  discoveries  of  olefin  metathesis^  and  click  chemistry.^  Theoretical,  algorithmic  and  computational 
advances  in  quantum  chemistry  over  the  last  two  decades  have  brought  it  to  the  stage  where  it  is  widely 
applied  for  qualitative  interpretation  of  such  new  experimental  findings,  and  increasingly  used  to  construct 
detailed  quantitative  descriptions  of  chemical  reaction  networks. 

The  main  parameters  in  chemical  kinetic  models  -  equilibrium  constants  and  rate  coefficients  -  are 
increasingly  derived  from  quantum  mechanical  calculations  on  stationary  points  (local  minima  and  first-order 
saddle  points)  along  the  potential  energy  surface  (PES).  Local  minima,  i.e.  reactants,  intermediates,  and 
products,  are  generally  easy  to  characterize  due  to  simple  bonding  and  because  the  negative  of  the  gradient 
along  the  PES  always  points  downhill.  Several  robust  algorithms  were  developed  for  identifying  local  minima.^ 
Saddle  points,  which  correspond  to  transition  states,  remain  a  challenge  for  systematic  characterization  due 
to  non-standard  variations  in  reactive  centers  and  because  a  transition  structure  optimization  must  step  uphill 
in  one  direction  and  downhill  in  all  other  orthogonal  directions.  At  present,  most  saddle  points  are  found 
based  on  human  intuition,  derived  primarily  from  experience  on  analogous  reactions:  the  human  provides  a 
very  good  initial  guess  at  the  saddle  point  geometry,  and  then  various  algorithms  (so-called  single-ended)  are 
used  to  refine  the  initial  guess  geometry  to  find  the  true  saddle  point. ^  ^  This  has  typically  been  applied  to 
cases  where  humans  expect  a  reaction  to  connect  the  reactant(s)  and  product(s),  and  so  it  also  rarely 
discovers  anything  surprising.  This  conventional  approach  turns  out  to  be  quite  good  for  finding  "expected 
reaction"  saddle  points,  but  rarely  discover  anything  that  is  unexpected.  As  an  alternative  to  human  input, 
the  initial  guess  structure  for  the  saddle  point  can  be  generated  by  methods  (so-called  double-ended 
searches)  which  reconstruct  the  reaction  path  between  known  reactant  and  product  sides. 

Most  large  chemical  kinetic  models  are  constructed  in  ways  which  do  not  allow  any  unexpected  reactions  to 
be  included;  this  is  true  whether  the  model  is  constructed  by  hand  or  using  automatic  mechanism 
generators.^"*  Incorporation  of  even  a  simple  new  reaction  in  chemical  kinetic  models  can  significantly  alter 
these  models  since  almost  certainly  it  will  be  the  first  member  of  a  new  large  family  of  related  reactions.  For 
instance,  several  unexpected  low-barrier  reactions  for  JP-10  pyrolysis^^  and  for  decomposition  of  the 
ketohydroperoxides,^®  which  are  critical  intermediates  in  alkane  autooxidation  and  ignition  chemistry,  were 
recently  discovered  in  our  group.  Including  the  new  unexpected  reactions  and  analogous  new  reactions  in  the 
kinetic  models  significantly  improved  the  reliability  of  the  kinetic  model  predictions.^'' 

All  this  experience  suggests  that  if  we  search  for  them  systematically,  we  can  expect  to  find  many 
"unexpected"  reactions  which  are  omitted  from  existing  chemistry  models,  and  that  incorporating  these  new 
reactions  will  significantly  improve  the  reliability  of  the  model  predictions.  In  the  present  work,  we  propose 
a  systematic  approach  for  discovery  new  chemical  reactions  with  minimum  human  effort.  The  method 
advances  recent  work  on  saddle  point  optimization  algorithms,  by  making  use  of  both  single-  and  double- 
ended  methods  in  a  fully  automated  procedure.  Among  a  great  variety  of  methods,"*'^^  we  employ  a 
combination  of  the  Berny  optimization  algorithm  as  implemented  and  improved  by  Schlegel®'®  and  the 
freezing  string  method^^  as  both  methods  were  readily  available  to  us  at  the  time  we  started  this  project.  We 
find  that  this  combination  works  fairly  well  for  several  systems  of  combustion  and  atmospheric  chemistry 
importance.  We  were  able  to  detect  not  all  'known'  reaction  pathways,  manually  detected  in  the  previous 
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studies,  but  also  new,  previously  'unknown',  reaction  pathways,  which  involve  significant  atom 
rearrangements.  The  present  work  is  complementary  to  recent  developments  in  computational  chemistry^'*-^® 
which  demonstrate  a  tendency  of  switching  from  a  traditional  secondary  role  of  theory  in  interpreting 
experimental  findings  in  chemistry  to  a  powerful  predictive  tool  for  analysis  of  chemical  reactivity,  discovery 
of  chemical  reaction  pathways,  and  generation  of  reaction  mechanisms  prior  to  any  experiments. 


Computational  Approach 

Overview 

We  have  developed  algorithms  for  comprehensively  enumerating  possible  products  from  a  given  reactant 
using  Ugi's  matrix  representation  of  both  molecular  structures  and  chemical  reactions.  The  products  are 
sorted  by  thermochemistry  (so  the  most  exothermic  reactions  are  explored  first;  by  the  Hammond  postulate 
these  usually  have  the  lowest  barriers)  and  by  the  number  of  bonds  broken  and  formed  (which  correlates 
with  the  Arrhenius  A  factor).  The  thermochemistry  is  computed  using  Benson-type  group  additivity  methods 
or  quantum  chemistry  at  the  user's  option.  The  computer  then  automatically  searches  for  a  low-energy 
reaction  path  connecting  the  reactant  to  the  products,  using  the  Freezing-String  Algorithm  and  Density 
Functional  Theory.  The  reaction  path  is  then  refined  using  a  Berny-type  search  to  find  the  saddle  point  (i.e. 
the  top  of  the  mountain  pass),  and  the  barrier  height  is  computed  using  the  user-specified  level  of  quantum 
chemistry. 

The  overall  algorithm  was  designed  and  implemented  by  this  project.  To  speed  the  development  and  to 
leverage  prior  research,  we  connected  our  software  with  several  different  packages:  RMG  for  estimating 
thermochemistry,  MarvinBeans  for  converting  the  matrix  representations  into  initial-guess  geometries  for 
each  conformation,  the  Merck  force  field  MMFF94  for  refining  the  initial  geometries,  Q-Chem  for  the 
Freezing-String  reaction  path  search  using  DFT,  and  GAUSSIAN  for  refining  the  transition  state  and  for  high- 
accuracy  (e.g.  CCSD(T))  barrier  height  calculations. 

Molecules  Used  to  Test  the  New  Algorithm 

We  have  applied  the  new  algorithm  to  reactions  of  several  different  types  of  chemical  species  expected  to  be 
important  in  engines:  (1)  the  ketohydroperoxide  HOOCH2CH2CHO,  representative  of  the  chain-branching 
species  important  in  diesel  ignition;  (2)  N-(hydroperoxymethyl)  formamide  HOOCH2NHCHO,  representative 
of  the  key  intermediate  in  oxidative  degradation  of  fuel  detergents  and  the  dispersants  in  jet  and  diesel  engine 
lubricants;  (3)  the  cyclic  peroxides  formed  from  HOOCH2CH2CHO  and  HOOCH2NHCHO;  (4)  the  related 
secondary  ozonide  and  the  corresponding  N-containing  compound  formed  in  pollutant  plumes;  (5)  1,4- 
pentadiene,  representative  of  multiply  unsaturated  hydrocarbons  formed  in  fuel-rich  combustion.  These  test 
molecules  were  chosen  to  cover  a  wide  range  of  chemistry  important  in  engines,  and  a  variety  of  molecular 
structures,  including  heteroatoms  and  a  mix  of  cyclic  and  acyclic  species. 

Technical  Details  of  the  Computations 
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1.  Matrix  Representation  of  Species  and  Reactions 


There  are  different  ways  to  describe  mathematically  compounds  presented  in  a  system. In  the  present 
algorithm,  we  chose  the  Bond  Electron  (BE)  matrices“'^°  as  one  of  the  most  vivid  and  convenient 
representations.  A  BE  matrix  R  for  a  compound  with  n  atoms  ri/h-.rn  is  a  square  matrix  of  dimension  n.  The 
row  of  this  matrix  corresponds  to  electron  configuration  of  atom  r,.  The  Ry  element  represents  the  number 
of  covalent  bonds  between  atoms  r\  and  q  and  matrix  R  is  symmetric,  i.e.  R\j  =  Ry.  The  diagonal  elements  Rw 
represent  the  number  of  free  valence  electrons  which  are  not  involved  in  bonds.  The  sum  over  the  entries  of 
the  row  or  column  of  a  BE-matrix  is  the  number  of  valence  electrons  which  belong  to  the  atom  n. 

The  total  matrix  of  the  initial  reactive  system  Rsum  is  depicted  by  a  block  matrix,  each  block  representing  one 
compound.  A  chemical  reaction  is  the  conversion  of  the  system  by  the  redistribution  of  valence  electrons.  It 
is  represented  by  transformation  of  the  initial  matrix  Rsum  by  the  addition  of  the  reaction  matrix  A  to  yield  a 
product  total  matrix  Psum  with  altered  connectivity.  The  reaction  matrix  A  is  also  symmetric  and  its  elements 
represent  the  appearance  (positive  value)  or  the  disappearance  (negative  value)  of  the  localized  valence 
electrons.  This  matrix  should  conserve  the  total  number  of  electrons  in  the  system,  i.e. 

Xa,  =0 

ij 

Usually  in  elementary  reactions  |aij|<3  since  the  change  by  two  implies  a  significant  chemical  process,  for 
instance,  formation  or  rupture  of  a  double  bond  in  a  single  elementary  step. 

After  applying  the  reaction  matrix  A,  the  product  matrix  P  can  then  be  rearranged  to  form  a  block  structure 
to  represent  the  product  molecules.  If  the  product  matrix  satisfies  the  mathematical  and  chemical  constraints, 
the  reaction  pathway  is  considered  to  be  feasible  and  is  added  to  the  list  of  channels  for  further 
thermochemistry  calculations. 

2.  Reducing  the  Size  of  the  Search  Space 

Thermochemistry  Calculations.  Even  for  a  small  hydrocarbon  system,  the  number  of  reaction  pathways  which 
can  be  generated  using  the  BE  matrix  representation  described  above  can  be  massive,  increasing 
exponentially  with  the  number  of  carbon  atoms.  The  thermodynamic  reaction  control  principle  can  be 
implemented  to  narrow  the  search  space:  highly  endothermic  reactions  are  likely  to  be  too  slow  to  be 
important.  The  relative  stability  of  the  products  can  be  estimated  using  their  thermochemical  properties,  and 
reaction  channels  with  less  stable  products  can  be  eliminated.  However,  on-the-fly  quantum-chemical 
calculation  of  thermochemical  properties  for  hundreds  (or  even  thousands)  species  using,  e.g.  density 
functional  theory  (DFT)  methods,  can  also  be  computationally  very  demanding.  One  efficient  alternative  way 
to  do  so  is  to  use  group  contribution  methods  that  estimate  the  thermochemistry  of  a  molecule  based  on  the 
sub-molecular  fragments  present  in  the  molecule.  The  Benson  group  additivity  framework  is  such  an  example 
of  a  group  contribution  method  that  has  proven  to  provide  accurate  estimates  of  the  ideal  gas 
thermochemistry  for  a  large  range  of  molecules.  Benson's  group  additivity  approach^^  divides  a  molecule  into 
functional  groups,  and  the  contribution  of  each  functional  group  to  the  overall  thermochemistry  is  included. 
The  group  additivity  enthalpy  estimate  is  typically  within  a  few  kcal/moP^  of  the  truth,  sufficient  to  reliably 
separate  very  endothermic  reactions  from  the  rest. 

Generating  3D  Geometries.  While  DFT  methods  can,  in  principle  be  implemented  to  generate  and  optimize 
the  starting  and  final  3D  geometries,  the  large  number  of  reaction  channels  can  prohibit  such  calculations.  A 
much  faster  route  is  to  use  a  molecular  mechanics  force  field.  Though,  in  principle,  any  type  of  reasonable 
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force  field  can  be  used,  we  choose  Merck  Molecular  force-fields  (MMFF94)  developed  by  Merck  Research 
Laboratories^®'^^  which  were  specifically  optimized  for  a  wide  range  of  organic  chemistry  calculations  (from 
very  small  molecules  to  proteins).  The  method  includes  parameters  from  high-quality  quantum  calculations 
(rather  than  experimental  data)  for  a  wide  range  of  atom  types  including  those  presented  in  the  examples 
studied  in  this  work. 

3.  Locating  Transition  States 

As  mentioned  in  Introduction,  methods  for  transition  state  structure  search  can  be  classified  as  single-ended 
and  double-ended.^'®  The  single-ended  methods  refine  an  initial  guess  of  the  transition  state  to  the  exact 
answer  by  exploring  the  PES  using  local  gradient  and  usually  the  second  derivative  information.  The  Berny 
saddle  point  optimization  algorithm  is  one  of  these  methods,  which  is  widely  used  in  ab  initio  calculations 
primarily  due  its  reliability.®'®  Berny  geometry  optimization  starts  with  an  initial  guess  for  the  second 
derivative  matrix  (Flessian)  which  is  determined  using  connectivity  derived  from  atomic  radii  and  a  simple 
valence  force  field. The  approximate  Flessian  matrix  is  improved  at  each  point  using  the  first  derivatives 
and  energies  computed  along  the  optimization  pathway.  The  main  input  for  such  algorithm  is  a  guess 
structure  that  should  be  located  very  close  to  the  transition  state  in  order  to  succeed.  At  present,  initial  guess 
geometries  are  usually  found  based  on  human  intuition,  and  automated  generation  of  good  initial  guess 
transition  state  structures  is  the  main  point  of  the  procedure  proposed  here. 

The  double-ended  methods  start  from  the  reactants  and  products  and  work  from  both  sides  to  find  the 
transition  structure  and  the  reaction  path.  These  include  the  nudged  elastic  band  method,®'^  string 
method,^^'^^  and  freezing  string  method  (FSM).^®  All  these  algorithms  generate  a  sequence  of  configurations 
located  between  the  reactant  and  products  geometries.  In  the  present  algorithm  we  choose  FSM  as  it  does 
not  require  Flessian  calculations  and  provides  one  of  the  most  inexpensive  algorithms  for  determining  a  path 
with  a  specified  number  of  intermediate  structures  (nodes)  connecting  the  known  reactant(s)  and  product(s). 
FSM  starts  from  both  ends  of  the  path,  and  adds  nodes  irreversibly  until  the  two  ends  join.  FSM  partially 
optimizes  intermediate  structures  orthogonally  to  the  reaction  path,  and  then  freezes  them  before  a  new  pair 
of  points  along  the  reaction  path  is  added.  The  number  of  gradients  required  to  locate  a  guess  saddle  point 
using  FSM  is  rather  small. 

Since  the  reactant  and  product  geometries,  corresponding  to  initial  and  final  minima,  are  known  from  the 
matrix  representation,  energy  cutoff  and  force  field  optimization  described  above,  automatic  path-finding 
tools,  such  as  the  freezing  string  method,  can  characterize  a  reaction  coordinate  joining  the  end-points.  The 
highest  point  on  the  pathway  becomes  a  good  initial  guess  for  subsequent  refinement  of  the  transition 
structure  using  the  single-ended  method  such  as  Berny  algorithm. 

Summary  of  methodology  and  computational  details 

All  the  methods  just  described  can  be  combined  in  a  systematic  and  automated  procedure  for  identifying 
reaction  pathways.  It  is  summarized  in  the  flowchart  presented  on  Figure  1.  Only  initial  reactant(s)  geometry 
is  required  at  the  start.  It  is  converted  to  the  reactant  matrix  R.  Different  reaction  matrices  A  are  then 
generated,  which  break  up  to  N  bonds  and  make  up  to  M  bonds,  where  N  and  M  are  defined  by  the  user.  The 
number  of  possibilities  rises  combinatorially  with  N  and  M.  In  the  benchmark  examples  below,  we  constrain 
1  <  A/ <  4  and  1  <  /V7  <  4.  We  constrain  A/,  /V7 <  4  to  reduce  computational  demands  and  N,M>1  since  most  of 
those  simple  reactions  are  well  understood.  To  reduce  the  computational  effort  we  restricted  the  products 
to  those  which  can  be  drawn  with  a  Lewis  structure  with  a  fixed  number  of  valence  electrons  for  each  element 
-  4  for  carbon,  2  plus  two  lone  pairs  for  oxygen,  1  for  hydrogen,  3  plus  a  lone  pair  for  nitrogen. 
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In  the  present  work,  we  restrict  all  the  elements  Ry  and  Py  in  the  reactant  and  product  matrices  to  be  integers. 
This  can  potentially  lead  to  issues  for  products  with  resonant  structures.  We  also  demand  number  of  unpaired 
electrons  on  an  atom  not  exceed  1.  This  is  certainly  ruling  out  some  known  species,  e.g.  biradicals  such  as 
methylene  and  0(^P),  but  we  can  always  relax  this  restriction  and  include  them  in  future  studies.  Since  BE 
matrices  are  invariant  with  certain  permutations,  several  BE  matrices  can  correspond  to  chemically  equivalent 
compounds.  However,  in  order  not  to  omit  important  reaction  pathways,  all  combinations  are  taken  into 
account  initially.  We  remove  any  duplicates  before  the  molecular  mechanics  and  quantum  chemistry 
calculations  are  run. 

The  product  matrices  P  are  obtained  and  converted  to  block  structure  by  simple  linear  algebra  operations. 
Thermochemistry  calculations  are  used  to  narrow  the  search  space  -  during  this  step  'known'  and  unstable 
products  are  detected.  In  the  present  study,  a  reaction  is  considered  to  be  too  endothermic  to  be  interesting 
if  the  standard  enthalpy  of  reaction  (denoted  as  DH  in  Tables  2-5)  is  higher  than  20  kcal/mol.  These 
calculations  are  performed  using  the  Reaction  Mechanism  Generator  (RMG)  software  package  developed  in 
our  group. 

The  next  step  is  automated  saddle  point  search  for  'unknown'  reaction  pathways  which  lead  to  stable 
products.  For  quantum  chemistry  calculations,  geometries  of  reactant(s)  and  product(s)  are  required.  They 
are  generated  by  converting  corresponding  blocks  of  the  P  matrices  to  SMILES  strings  (standard  format  for 
molecule  mechanics  calculations)  and  quick  force  field  optimization  using  academic  version  of  the 
MarvinBeans  package  from  ChemAxon.^'*  Special  care  is  taken  of  the  conversion  between  different  molecule 
representation  formats,  eliminating  duplicate  channels.  In  the  present  study,  we  employ  the  MMFF94  force 
field  to  determine  the  geometry  of  the  products. 

The  BE  matrix  representation  is  identical  for  all  conformers.  However,  the  distinct  3d  structure  of  conformers 
strongly  affects  the  properties  and  the  reactions  of  molecules.  In  order  to  take  into  account  conformational 
effects,  we  used  the  MarvinBean's  Conformer  Plugin  to  generate  a  series  of  stable  3d  structures  (up  to  three 
in  the  present  study),  i.e.  conformers  for  each  compound.  If  more  than  one  product  is  formed  in  the  reaction 
channel,  their  structures  were  first  aligned  to  maximum  coincidence  with  the  reactant  structure  in  non-mass 
weighted  Cartesian  coordinates  and  then  the  length  of  the  vector  between  the  centers  of  mass  of  the 
products  was  increased  in  order  to  avoid  overlaps  between  atoms  in  different  product  molecules. 

For  the  determination  of  all  the  transition  states  we  used  the  FSM  implemented  in  the  Q-Chem  software 
package  with  default  spacing  parameters  (20  nodes,  6  perpendicular  gradients  per  node).^^  In  order  not  to 
repeat  computations  on  duplicate  structures,  all  input  structures  are  compared  prior  to  initiating  FSM 
calculations  and  identical  input  files  were  eliminated.  However,  chemically  equivalent  channels  with  distinct 
3d  geometries  were  not  excluded  from  the  procedure  in  order  not  to  omit  conformers  or  atom  arrangements 
which  could  potentially  lead  to  low-energy  reaction  pathways.  For  channels  where  FSM  found  an  apparent 
single  barrier  along  the  reaction  path  (without  any  restrictions  to  the  energy  barrier  height),  these  calculations 
were  followed  by  transition  state  search  using  the  Berny  optimization  algorithm  implemented  in  the  Gaussian 
03  software  package^®  starting  from  the  FSM  geometry.  For  each  detected  saddle  point,  we  also  perform 
intrinsic  reaction  path  (IRC)  calculations  which  go  downhill  from  the  transition  state  to  see  which  two  minima 
it  is  connected  to  and  to  verify  whether  this  saddle  point  corresponds  to  the  reaction  from  R  to  P. 

Geometry  optimization  and  reaction  path  analysis  for  the  examples  herein  were  performed  using  M062X/6- 
311++G*  level  of  theory,  while  single  point  energies  at  the  stationary  points  were  obtained  using  CCSD(T)/6- 
311++G*  level  of  theory.  Thermochemistry  of  the  detected  reactions  was  computed  using  CBS-QB3  level  of 
theory.  Note  that  the  proposed  procedure  is  not  dependent  on  the  use  of  any  particular  software  package  or 
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level  of  theory.  Our  choice  of  such  parameters  as  quantum  chemistry  methods,  basis  set,  force  fields,  FSM 
parameters,  number  of  conformers,  method  for  orientating  product  structures  should  be  considered  as 
reasonable  first  attempts  which  can  be  improved  in  the  future  as  the  community  gains  more  experience. 

Each  set  of  quantum  chemistry  calculations  -  structure  optimization,  energy  calculation  and  the  FSM 
calculation  -  is  performed  independently  in  parallel  to  exploit  the  full  potential  of  supercomputers.  All  the 
computations  are  automated,  to  reduce  time-to-completion  and  also  to  reduce  bias  related  to  manual 
intervention  to  help  the  computer  find  a  saddle  point,  and  which  so  tend  to  find  only  "human-expected" 
known  reactions.  In  order  to  demonstrate  how  such  fully-automated  procedure  can  accelerate  the  discovery 
of  new  chemistry,  we  use  it  to  study  several  systems  of  combustion,  oxidation  and  atmospheric  chemistry 
importance  which  span  a  reasonable  wide  space  of  chemical  reactivity. 


Results 

For  each  initial  structure.  Table  1  shows  the  number  of  product  channels  generated  using  the  BE 
representation.  The  results  of  thermochemistry  analysis  using  Benson  group  additivity  approach  of  all  product 
channels  are  also  given  in  Table  1.  All  reaction  pathways  identified  are  summarized  in  Figures  2-5  and  Tables 
2-5. 


1.  New  reactions  of  Ketohydroperoxides 

Ketohydroperoxides  are  the  main  source  of  radicals  during  the  first  stage  of  low-temperature  ignition,  and 
are  also  formed  in  liquid  phase  oxidation.  Combustion  parameters,  such  as  ignition  delay,  are  very  sensitive 
to  the  details  of  their  chemistry.^®  Most  previous  combustion  models  assumed  ketohydroperoxides  do  only 
one  reaction:  forming  radicals  by  breaking  the  weak  0-0  bond.  Flowever,  new  chemical  reactions  of 
ketohydroperoxides  which  transform  them  into  cyclic  peroxides  and  then  acids  and  carbonyl  molecules  has 
been  recently  characterized^®  suggesting  that  chemistry  of  ketohydroperoxides  is  more  complicated. 

Table  1  shows  that  almost  500  product  channels,  which  break  and  make  just  2  or  3  bonds,  are  possible  from 
this  small  molecule.  (Note  that  this  includes  distinct  reactions  which  form  the  same  products,  i.e.  there  is 
often  more  than  one  mountain  pass  connecting  two  valleys.  Without  repeated  channels,  this  number  reduces 
to  109.).  Flalf  of  the  channels  are  exothermic  and  almost  70  %  of  the  channels  have  AFIrxn<  20  kcal/mol.  At 
the  same  time,  only  one  (!)  product  channel  (l,2-dioxolan-3-ol)  is  reached  by  a  "well-known"  reaction  type 
available  in  the  current  RMG  database -the  first  step  of  the  Korcek  mechanism  forming  an  intermediate  cyclic 
peroxide.^®  As  shown  in  Figure  2  (Reaction  (1)),  our  simple  algorithm  was  able  to  locate  efficiently  transition 
states  for  this  'known'  reaction  though  it  involves  a  significant  rearrangement  of  atoms  including  cycle 
formation  and  hydrogen  transfer  between  two  oxygen  atoms.  Figure  2  also  shows  that  the  automatic  search 
found  transition  states  for  5  additional  reaction  pathways.  Note  that  one  of  these  channels  leads  directly  to 
the  products  of  the  Korcek  mechanism  -  Reaction  (4)  on  Figure  2.  Flowever,  Table  2  shows  that  the  energy 
barrier  (denoted  as  AE*)  for  this  channel  is  higher  by  almost  20  kcal  and  therefore,  rather  than  going  directly 
over  this  high  mountain  pass,  sequential  reactions  via  an  intermediate  valley  (the  cyclic  peroxide)  are 
expected  to  be  the  fastest  reaction  path.  This  channel  was  located  while  searching  for  a  reaction  path  to  a 
different  product  (see  Comments  in  Table  2).  This  example  illustrates  that  the  saddle-point  search  procedure 
used  in  this  work  does  not  guarantee  convergence  to  the  desired  transition  state.  As  a  result  of  its  non- 
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iterative  nature,  the  FSM  cannot  guarantee  conservation  of  the  exact  IRC  profile  and  therefore  further 
optimization  of  the  guess  structure  (the  highest  energy  node)  with  Berny  optimization  can  lead  to  structures 
which  do  not  correspond  to  the  initial  reaction  path.  The  calculations  also  occasionally  converged  to  saddle 
points  which  correspond  to  different  initial  reactants.  More  importantly,  the  majority  of  our  reaction  path 
searches  do  not  converge  to  a  saddle  point.  This  typically  occurred  because  the  energy  profiles  from  the  FSM 
calculations  for  these  paths  exhibited  several  very  high  barriers  -  not  the  single  elementary-step  transition 
state  we  were  looking  for  -  and  therefore  the  Berny-type  TS  search  was  not  performed.  In  these  cases,  it  is 
possible  that  no  single  saddle  point  exists  that  connects  reactant  and  product(s),  or  just  that  with  our  choices 
of  parameters  and  options  the  FSM  is  not  robust  enough  to  find  one.  Clearly  improved  reaction  path  search 
methods  are  needed.  Nonetheless  the  current  algorithm  is  sufficient  to  discover  several  new  unexpected 
chemical  reactions. 


2.  Cyclic  peroxides  and  ozonides 

Upon  examination  of  the  reactivity  of  l,2-dioxolan-3-ol  -  our  second  case  study  -  we  confirmed  that 
fragmentation  of  the  cyclic  peroxide  leads  to  two  possible  pairs  of  acid  and  aldehyde  products  (Reactions  (24) 
and  (25)  in  Figure  4)  previously  characterized  in  Ref.  16.  We  were  also  able  to  locate  the  reverse  pathway 
which  leads  to  the  initial  molecule  (Reaction  (22)  in  Figure  4)  confirming  that  the  proposed  procedure  is  rather 
sustainable.  The  fact  that  the  fully  automated  search  reproduces  the  results  of  the  previous  manual  studies 
of  the  Korcek  mechanism  -  and  also  discovers  a  half-dozen  additional  transition  states  to  products  not  found 
by  that  manual  search  -  is  encouraging. 

Another  interesting  example  is  the  chemistry  of  the  Criegee  intermediate  with  formaldehyde  (third  case 
study).  These  two  species  quickly  recombine  forming  the  more  stable  secondary  ozonide  intermediate 
(ethylene  ozonide)[38]  whose  reactivity  was  investigated  in  the  present  work.  All  three  reaction  pathways 
identified  for  ethylene  ozonide  are  presented  in  Figure  3  (Reactions  (15)-(17))  and  Table  3.  In  accordance  with 
the  previous  manual  search  [38],  we  find  that  two  low  energy  pathways  for  ethylene  ozonide  decomposition 
are  formation  of  hydroxylmethyl  formate  by  breaking  the  weak  0-0  bond  (the  lowest  energy  saddle  point) 
and  formation  of  formaldehyde  and  formic  acid.  In  addition  to  the  reactions  found  in  Ref.  38,  we  also  find 
that  ethylene  ozonide  can  decompose  to  formaldehyde  and  oxirane  with  similar  barrier  height. 

3.  Nitrogen-containing  Analogues 

Table  1  shows  that  the  chemistry  of  the  N-containing  analogous  compounds  is  even  less  known  -  no 
unimolecular  channels  with  2  and  3  bonds  breaking/forming  were  found  in  the  RMG  database.  Flowever,  we 
were  able  to  identify  and  characterize  several  reaction  pathways  using  our  automated  procedure.  The  results 
are  summarized  in  Tables  2-4  and  Figures  2-4.  They  show  that  the  chemistry  for  these  molecules  is  similar  to 
the  first  three  structures.  For  example.  Reaction  (7)  is  similar  to  the  first  step  of  the  Korcek  mechanism 
(Reaction  (1)).  Flowever,  replacing  one  CFI2  group  with  an  NFI  group  significantly  enriches  the  chemistry.  For 
example.  Reaction  (9)  involves  two  Fl-transfers  (transition  state  shown  in  Figure  2)  and  Reactions  (11)  and 
(12)  form  3  products  from  one  reactant. 

Table  2  shows  that  in  the  case  of  N-(hydroperoxymethyl)  formamide,  half  of  the  identified  saddle  points  do 
not  correspond  to  the  intended  product(s)  generated  by  the  BE  matrix  representation.  Due  to  initial  open- 
chain  structure,  all  these  cyclization  reactions  require  significant  rearrangement  of  several  atoms,  and 
apparently  FSM  has  trouble  with  these  cases.  Similarly,  the  algorithm  failed  to  find  several  cyclization 
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transition  states  of  the  /-ketohydroperoxide.  At  the  same  time,  all  saddle  points  detected  for  more 
constrained  cyclic  reactants  1,2,4-dioxazolidine  and  l,2,4-dioxazolidin-3-ol  are  for  the  intended  products  (see 
Tables  3  and  4). 

4.  1,4-Pentadiene 

As  a  final  example,  the  chemical  reactivity  of  1,4-pentadiene  is  investigated  (see  Table  5  and  Figure  5).  Due  to 
many  possible  single  and  double  hydrogen  atom  transfers,  many  product  channels  are  possible  in  this  system. 
Table  1  shows  that  total  number  of  channels  exceeds  seven  hundred.  However,  out  of  this  number,  only  8 
saddle  points  were  located  and  3  of  these  correspond  to  known  reaction  types  in  the  RMG  database.  All 
structures  were  connected  to  the  initial  molecule  by  a  large  barrier  (higher  than  60  kcal/mol).  Nevertheless, 
the  ability  of  our  algorithm  not  only  to  reproduce  the  existing  'known'  reaction  pathways  but  also  to  identify 
5  new  reaction  pathways  for  this  simple  molecule  is  very  promising. 

Conclusions 


The  algorithm  proposed  here  for  finding  elementary  reaction  paths  possesses  several  important  features:  it 
requires  no  human  intervention  and  it  does  not  require  any  information  on  the  chemical  reactivity  of  the 
reactant(s).  In  order  to  detect  saddle  points  it  generates  a  series  of  product(s)  and  tries  to  find  pathways  that 
connect  the  product(s)  to  the  initial  structure  -  this  would  be  tedious  to  do  without  automation.  It  is  much 
more  computationally  efficient  (i.e.  requires  fewer  energy/gradient  calculations)  than  methods  based  on  ab 
initio  molecular  dynamics,  since  it  rapidly  finds  and  focuses  on  the  structures  near  the  saddle  point  connecting 
the  reactant  with  designated  products.  The  present  examples  show  that  our  automated  algorithm  is  able  to 
detect  not  only  previously  detected  reaction  pathways  but  also  several  new,  previously  unknown,  types  of 
reactions.  Our  constraint  that  allows  only  two  or  three  bonds  to  be  formed  and  broken  in  the  same 
elementary  step  worked  well  for  the  present  examples,  keeping  the  number  of  possible  product  channels 
manageable  (<  1000  possible  reaction  paths  per  reactant)  but  still  allowing  us  to  discover  a  lot  of  new 
chemistry.  The  present  results  also  show  that  detecting  thermodynamically  feasible  products  does  not 
guarantee  kinetic  feasibility.  The  degree  of  success  in  locating  unexpected  kinetically  favorable  elementary 
steps  is  rather  low-  a  significant  number  of  thermodynamically  stable  structures  either  are  connected  to  the 
initial  structure  only  through  multiple  elementary  steps  or  via  chemically  infeasible  routes.  In  future,  some  of 
these  routes  which  are  the  most  obviously  unlikely,  such  as  breaking  the  C=0  bond,  might  be  eliminated  in 
order  to  focus  resources  on  discovering  transition  states  for  more  likely  reactions.  Another  possibility  is  to 
freeze  certain  bonds  and  angles  that  are  expected  to  be  unreactive  in  order  to  reduce  the  number  of 
conformers  and  matrices  dimensionality.  However,  such  limitations  should  be  imposed  with  caution  as  they 
may  prohibit  detecting  unexpected  important  reaction  pathways! 

The  success  of  the  exact  search  for  saddle  point  relies  on  the  ability  to  generate  very  accurate  corresponding 
guess  structures  from  a  double-ended  method,  such  as  the  freezing  string  method  (FSM)  employed  in  the 
present  study.  FSM  can  be  considered  as  one  of  the  fastest  double-ended  methods.  However,  FSM  is  not 
guaranteed  to  converge  to  the  reaction  path  of  interest  and  therefore  convergence  to  the  correct  saddle  point 
cannot  be  rigorously  guaranteed. For  each  detected  transition  state  we  performed  the  intrinsic  reaction 
coordinate  (IRC)  integration  calculation  to  recover  both  reactant  and  product  sides.  In  several  cases,  we  found 
that  the  Berny  optimization  of  the  highest  node  along  the  reaction  path  generated  by  FSM  leads  to  wrong 
transition  states,  which  do  not  connect  the  specified  reactant  and  product  states.  Therefore,  a  good  initial 
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guess  of  the  transition  state  remains  one  of  the  major  challenges  in  the  use  of  such  algorithms  for  complex 
systems.^® 

As  an  alternative  to  FSM,  its  predecessor,  the  growing  string  method  (GSM),  an  iterative  algorithm  that 
connect  the  reactant  and  the  product  via  the  IRC,  can  be  implemented.  Recent  advances  in  the  growing  string 
method  have  allowed  it  to  locate  the  exact  transition  state  for  a  given  reaction  path.^®'''°  A  procedure  for 
automatic  reaction  path  finding  which  is  based  on  GSM  has  been  recently  proposed. However,  while 
formally  very  attractive,  the  GSM  is  substantially  more  computationally  expensive  compared  to  the  cost  of 
FSM  calculation  (by  up  to  a  factor  of  10),  and  in  this  work  the  FSM  calculations  are  the  limiting  computational 
step  in  the  algorithm.  Moreover,  even  GSM  can  fail  to  locate  a  saddle  point  for  a  single  elementary  step  as  no 
double-ended  string  method  is  perfectly  reliable.  An  important  point  of  the  present  testing  is  that  FSM  is  able 
to  discover  several  important  transition  states  recently  discovered  by  manual  searches. However,  future 
improvement  of  reliability  as  well  as  reducing  computational  costs  of  double-ended  methods  is  obviously 
required. 
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Figure  1.  Flowchart  for  automated  discovery  of  elementary  chemical  reaction  steps  using  molecular 
mechanics  and  quantum  chemistry  calculations.  See  "Theory"  section  for  details  of  each  step. 
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Figure  2.  Automatically  identified  unimolecular  reactions  of  y-ketohydroperoxide  (00CCC=0)  (1-6) 
and  N-(hydroperoxymethyl)  formamide  (00CNC=0)  (7-14).  Reactions  (3),  (4),  (9)-(12)  are  unusual 
and  unexpected. 
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Figure  3.  Automatically  identified  unimolecular  reactions  of  ethylene  secondary  ozonide 
(OlCOOCl)  (15-17)  and  1,2,4-dioxazolidine  (NlCOOCl)  (18-21).  Reactions  (17)  and  (21)  are 
particularly  unusual  and  unexpected. 
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Figure  4.  Automatically  identified  unimolecular  reactions  of  l,2-dioxolan-3-ol  (OCIOOCCI)  (22-25) 
and  l,2,4-dioxazolidin-3-ol  (OCIOOCNI)  (26-36).  Reactions  (30)-(33)  and  (35)  are  particularly 
unusual  and  unexpected. 
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Figure  5.  Automatically  identified  unimolecular  reactions  of  1,4-pentadiene  (C=CCC=C)  (37-44). 
Reactions  (38)-(40)  and  (43)  are  particularly  unusual  and  unexpected. 
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Table  1.  Number  of  elementary  steps  investigated  for  different  initial  reactants. 


Initial  reactant 
(SMILES) 

Total  number 
of  product 
channels 

identified 

Number  of 
low  energy  channels 
(AHr°(298K)<  (AHr°(298K) 

Okcal/mol)  <20kcal/mol) 

Number  of 
automatically 
identified  transition 

states 

(versus  existing  RMG 
database)'’ 

y-ketohydroperoxide 

77" 

39" 

53" 

6  (RMG 

(00CCC=0) 

415‘’ 

204*’ 

276*’ 

database:!) 

N- 

62" 

22" 

32" 

8  (RMG  database:0) 

( hyd  ro  pe  roxy  m  et  hy  1 ) 

293^^ 

84*’ 

123*’ 

formamide 

(00CNC=0) 

ethylene  ozonide 

42" 

23" 

35a 

3  (RMG  database:0) 

(OlCOOCl) 

rsi 

00 

59*’ 

80*’ 

1,2,4-dioxazolidine 

56" 

35" 

49" 

4  (RMG  database:0) 

(NlCOOCl) 

190‘’ 

137*’ 

159*’ 

l,2-dioxolan-3-ol 

90" 

40" 
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4  (RMG  database:2) 

(OCIOOCCI) 

533‘’ 

233*’ 

344*’ 

l,2,4-dioxazolidin-3-ol 

74" 

43" 

59" 

11  (RMG 

(OCIOOCNI) 

378^^ 

191*’ 

275*’ 

database:0) 

penta-l,4-diene 

96" 

45" 

64" 

8  (RMG  database:  3) 

(C=CCC=C) 

652‘’ 

197*’ 

341*’ 

^  Number  of  breaking  bonds  =  2;  Number  of  forming  bonds  =  2 


‘’Number  of  breaking  bonds  =  3;  Number  of  forming  bonds  =  3 
from  http://rmg.mit.edu 
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